Improving Lexical Alignment Using Hybrid Discriminative and Post-Processing Techniques
نویسندگان
چکیده
Automatic lexical alignment is a vital step for empirical machine translation, and although good results can be obtained with existent models (e.g. Giza++), more precise alignment is still needed for successfully handling complex constructions such as multiword expressions. In this paper we propose an approach for lexical alignment combining statistical and linguistic information. We describe the development of a baseline discriminative aligner and a set of language dependent post-processing functions that allow the inclusion of shallow linguistic knowledge. The post-processing functions were designed to significantly improve word alignment mainly on verb-particle constructs both over our baseline and over Giza++.
منابع مشابه
Addressing Problems across Linguistic Levels in SMT: Combining Approaches to Model Morphology, Syntax and Lexical Choice
Morphological complexity • Data sparsity due to uncovered inflected forms • Difficulty to produce the correct target-side inflection based on available information COMBINING APPROACHES • Pre-processing – syntactic level Source-side reordering (Gojun and Fraser, 2012) • At decoding time – lexical level Discriminative classifier to score translation rules using source-side context (Tamchyna et al...
متن کاملJU_NLP at SemEval-2016 Task 11: Identifying Complex Words in a Sentence
The complex word identification task refers to the process of identifying difficult words in a sentence from the perspective of readers belonging to a specific target audience. This task has immense importance in the field of lexical simplification. Lexical simplification helps in improving the readability of texts consisting of challenging words. As a participant of the SemEval-2016: Task 11 s...
متن کاملTowards Accurate and Efficient Chinese Part-of-Speech Tagging
From the perspective of structural linguistics, we explore paradigmatic and syntagmatic lexical relations for Chinese POS tagging, an important and challenging task for Chinese language processing. Paradigmatic lexical relations are explicitly captured by word clustering on largescale unlabeled data and are used to design new features to enhance a discriminative tagger. Syntagmatic lexical rela...
متن کاملUsing Cognates in a French-Romanian Lexical Alignment System: A Comparative Study
This paper describes a hybrid French Romanian cognate identification module. This module is used by a lexical alignment system. Our cognate identification method uses lemmatized, tagged and sentence-aligned parallel corpora. This method combines statistical techniques, linguistic information (lemmas, POS tags) and orthographic adjustments. We evaluate our cognate identification module and we co...
متن کاملImage Segmentation using Improved Imperialist Competitive Algorithm and a Simple Post-processing
Image segmentation is a fundamental step in many of image processing applications. In most cases the image’s pixels are clustered only based on the pixels’ intensity or color information and neither spatial nor neighborhood information of pixels is used in the clustering process. Considering the importance of including spatial information of pixels which improves the quality of image segmentati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011